Power efficiency in microprocessors

ABSTRACT

A method of reducing the power consumption of microprocessor system is provided, wherein: said microprocessor system comprises a microprocessor ( 2 ) and a memory ( 4 ) connected by a bus ( 6 ); said memory ( 4 ) contains a plurality of data values, each represented by a number of bits, for transmission to said microprocessor ( 2 ) via the bus ( 6 ); and at least some of said data values contain unused bits; and wherein said method includes assigning values to said unused bits in such a way as to reduce the Hamming distance between successive data values by a greater extent than settting all of said unused bits to an arbitrary predetermined value.

The invention relates to improved power efficiency in microprocessors.

The concept of Hamming distance will first be described. The Hammingdistance between two binary numbers is the count of the number of bitsthat differ between them. For example: Numbers in Numbers in binaryHamming decimal (inc. leading zeros) distance 4 and 5  0100 and 0101 1 7and 10 0111 and 1010 3 0 and 15 0000 and 1111 4

Hamming distance is related to power efficiency because of the way thatbinary numbers are represented by electrical signals. Typically a steadylow voltage on a wire represents a binary 0 bit and a steady highvoltage represents a binary 1 bit. A number will be represented usingthese voltage levels on a group of wires, with one wire per bit. Such agroup of wires is called a bus. Energy is used when the voltage on awire is changed. The amount of energy depends on the magnitude of thevoltage change and the capacitance of the wire. The capacitance dependsto a large extent on the physical dimensions of the wire. So when thenumber represented by a bus changes, the energy consumed depends on thenumber of bits that have changed—the Hamming distance—between the oldand the new value, and on the capacitance of the wires.

If one can reduce the average Hamming distance between successive valueson a high-capacitance bus, keeping all other aspects of the system thesame, the system's power efficiency will have been increased.

The capacitance of wires internal to an integrated circuit is smallcompared to the capacitance of wires fabricated on a printed circuitboard due to the larger physical dimensions of the latter.

BP 0,926,596 describes a method of optimizing assembly code of a VLIWprocessor or other processor that uses multiple-instruction words, eachof which comprise instructions to be executed on different functionalunits of the processor.

According to the invention there is provided a method of reducing thepower consumption of a microprocessor system, a memory, a computerreadable medium, computer programs, and a microprocessor system, as setout in the accompanying claims.

Embodiments of the invention will now be described, by way of exampleonly, with reference to the accompanying FIGURE, which shows theinterconnections between a microprocessor and its memory.

In the accompanying FIGURE, a microprocessor 2 is connected to memory 4by a number of buses which are implemented on a printed circuit board(not shown).

The embodiments described here aim to reduce the average Hammingdistance between successive values on the microprocessor-memoryinterface buses, shown as the lighter colored nets 6, 8, 10 and 12 inFIG. 1, as this will have a significant influence on power efficiency.

Even in systems where microprocessor and memory are incorporated intothe same integrated circuit the capacitance of the wires connecting themwill be larger than average, so even in this case reduction of averageHamming distance on the microprocessor-memory interface is worthwhile.

Processor-memory communications perform two tasks. Firstly, theprocessor fetches its program from the memory, one instruction at atime. Secondly, the data that the program is operating on is transferredback and forth. The embodiments described here focus on instructionfetch, which makes up the majority of the processor-memorycommunications, but the invention is by no means limited to instructionfetch.

The instruction fetch bus 6 is the bus on which instructions arecommunicated from the memory 4 to the processor 2. Embodiments describedhere aim to reduce the average Hamming distance on this bus, i.e. toreduce the average Hamming distance from one instruction to the next.

It is common for instruction sets to be redundant, i.e. for instructionsto contain information that is ignored by the processor. In particularsome instructions contain unused bits.

For example in the instruction set considered here all instructions are32 bits long. There are three instruction formats:

Bits marked ‘X’ are unused bits. The other bits convey usefulinformation to the processor 2.

In two of these formats, 31 of the 32 bits are used. In the thirdformat, 26 of the 32 bits are used. The remaining six bits arecompletely ignored by the processor 2. Some further features of ourillustrative instruction set mean that other bits will also sometimes beunused, but the exact details of this have been omitted for clarity.

Any or all of the unused bits can be assigned to a combination of ‘0’sor ‘1’s without changing the meaning of the program.

Many other common processor 2 instruction sets also have unused bits.Typically, existing compiler tool chains will set all of the unused bitsto ‘0’.

Although the processor 2 ignores these bits, they still contribute tothe average Hamming distance for instruction fetches. The embodimentsdescribed here assign values to unused bits in a way that reduces theHamming distance and hence maximizes the power efficiency.

For example, consider the following sequence of three instructions:

-   A: x0111000011010001101000110100111-   B: x00000001001111xxxxx01101100011-   C: x0001101111000111100110011011100

Note the group of unused bits, marked ‘xxxxx’ in instruction B. Aconventional system might set all unused bits to ‘0’ giving thefollowing code:

-   A: 00111000011010001101000110100111-   B: 00000000100111100000001101100011-   C: 00001101111000111100110011011100

The Hamming distances between these instructions are:

-   From A to B: 16-   From B to C: 22

One embodiment described here instead gives the unused bits thefollowing values:

-   A: 00111000011010001101000110100111-   B: 00000000100111101101001101100011-   C: 00001101111000111100110011011100

In this case the Hamming distances have been reduced to:

-   From A to B: 13-   From B to C: 21

This technique does not require any modifications to the microprocessor2. Power is saved through only changing the program bit pattern.

A first embodiment of the invention uses the following method forassigning unused bits in a sequence of instructions:

-   -   For the first instruction in the program, set any unused bits to        0.    -   Considering each subsequent instruction in the program        sequentially:        -   Considering each bit in this instruction:            -   If this bit is unused, assign the value of the                corresponding bit from the previous instruction to it.

An example of this method will now be given using the following sequenceof 8-bit instructions:

-   Bit number: 76543210    -   01x01x01    -   x001x010    -   xx000x1x    -   00x1x100    -   1x001x00    -   1x001x00    -   00001x00    -   01x0xx10    -   10000xx1    -   01x100x0    -   1x1x00x0    -   01x01000    -   xx001x0x    -   110x01x0

The changes to the bits of the first three instructions will now begiven in detail.

The first instruction is 01x01x01. Bit 7 of instruction is 0

Do nothing Bit 6 of instruction is 1

Do nothing Bit 5 of instruction is X

Set it to 0 Bit 4 of instruction is 0

Do nothing Bit 3 of instruction is 1

Do nothing Bit 2 of instruction is X

Set it to 0 Bit 1 of instruction is 0

Do nothing Bit 0 of instruction is 1

Do nothing

After processing the first instruction it has been changed to 01001001.

The second instruction is x001x010. Bit 7 of instruction is X

Set it to 0 (Copied from bit 7 of first instruction) Bit 6 ofinstruction is 0

Do nothing Bit 5 of instruction is 0

Do nothing Bit 4 of instruction is 1

Do nothing Bit 3 of instruction is X

Set it to 1 (Copied from bit 3 of first instruction) Bit 2 ofinstruction is 0

Do nothing Bit 1 of instruction is 1

Do nothing Bit 0 of instruction is 0

Do nothing

After processing the second instruction it has been changed to 00011010.

The third instruction is xx000x1x. Bit 7 of instruction is X

Set it to 0 (Copied from bit 7 of second instruction) Bit 6 ofinstruction is X

Set it to 0 (Copied from bit 6 of second instruction) Bit 5 ofinstruction is 0

Do nothing Bit 4 of instruction is 0

Do nothing Bit 3 of instruction is 0

Do nothing Bit 2 of instruction is X

Set it to 0 (Copied from bit 2 of second instruction) Bit 1 ofinstruction is 1

Do nothing Bit 0 of instruction is X

Set it to 0 (Copied from bit 0 of second instruction)

After processing the third instruction it has been changed to 00000010.

The complete sequence of output instructions, after processing accordingto the method of the first embodiment, is given in the following table.Input instruction Output instruction 00X1X100 00010100 1X001X00 100011001X001X00 10001100 00001X00 00001100 01X0XX10 01001110 10000XX1 1000011101X100X0 01010010 1X1X00X0 11110010 01X01000 01101000 XX001X0X 01001000110X01X0 11000100

In this example, the mean inter-instruction Hamming distance after theunused bits have been assigned is 2.61. If all unused bits had beenassigned to zero, then the mean inter-instruction Hamming distance wouldbe 2.92, indicating a power saving of around 5%.

This method produces optimal results for straight-line code, i.e. codethat has no branches in the flow-of-control. To take into accountnon-linear flow-of-control a more sophisticated method is required, aswill be described below.

The following table shows an example of a fragment of pseudo high-levelcode and corresponding psuedo assembly language instructions: if a=1then Compare a with 1 (1)  b:=0; Branch, if not equal, to L1 (2) elseset b to 0 (3)  c:=1; jump to L2 (4) end if; L1: set c to 1 (5) d:=d+1;L2: add 1 to d (6) e:=a; set e to a (7)

There are two possible paths that control-flow can take through thiscode. If a is equal to 1 then the sequence of instructions executed is 1

2

3

4

6

7. If a is not equal to 1 then the sequence is 1

2

5

6

7.

The simple algorithm presented above would assign unused bits as if theexecution sequence were 1

2

3

4

5

6

7. This is not necessarily the optimal assignment for either of theactual execution sequences.

A second embodiment of the invention incorporates an unused bitassignment method that takes into account flow of control.

When an unused bit is both preceded and followed by used bits in theadjacent instructions, setting the unused bit to the value of either thepreceding bit or the following bit will optimise Hamming distance. Forexample: Preceding bit Unused bit Following bit A: 0 X 1 B: 1 X 1

In example A, the value of the preceding bit can be copied into theunused bit giving the bit-sequence 001, or the value of the followingbit can be copied into the unused bit giving the bit-sequence 011. Inboth cases there is exactly one transition overall. In example B,whichever bit's value is copied into the unused bit it will be a 1,giving no transitions in either case.

The first embodiment described above always copies from the precedinginstruction. A modification of the first embodiment could run in reverseand always copy from the following instruction. The method of the secondembodiment may copy from either.

In the example, instruction 2 is a point of divergence, because thefollowing instruction can be either instruction 3 or instruction 5.Instruction 6 is a point of convergence, because the precedinginstruction can be either instruction 4 or instruction 5.

Instructions at points of convergence have more than one possiblepreceding instruction. If we used only preceding instructions todetermine how to assign unused bits we would have to make a decisionabout which of the two possible preceding instruction to use. Instead inthese cases we can use the following instruction as the basis for unusedbit assignment. So in the example, unused bits in instruction 6 arefilled in from instruction 7.

The method of the second embodiment is based on the following threerules:

-   -   1) If an instruction has more than one preceding instruction,        i.e. it is at a point of convergence, assign unused bits based        on the following instruction.    -   2) If an instruction has more than one following instruction,        i.e. it is at a point of divergence, assign unused bits based on        the preceding instruction.    -   3) If an instruction has exactly one preceding and exactly one        following instruction, i.e. it is neither a point of convergence        nor of divergence, then assign unused bits based on either the        preceding or the following instruction.

The following table shows how this can be applied to the example shownabove: Possible Possible Assign preceding following unused instructionsinstructions bits based on compare ‘a’ with 1 (1) (None) 2 2 branch ifnot equal, to L1 (2) 1 3, 5 1 set ‘b’ to 0 (3) 2 4 2 or 4 jump to L2 (4)3 6 3 or 6 L1: set ‘c’ to 1 (5) 2 6 2 or 6 L2: add 1 to ‘d’ (6) 4, 5 7 7set ‘e’ to ‘a’ (7) 6 (None) 6

In a variant embodiment an instruction's unused bits are based on anon-adjacent instruction. For example, instruction (5) could be based oninstruction (2). This will typically occur at the target of anunconditional branch. In an implementation of this algorithm, such anassignment may be less practical than an assignment from an adjacentinstruction because it requires that the implementation compute thetarget of the branch. In some cases, this may even be impossible,particularly when the branch target is computed at run time.

The following are hypothetical machine code bit patterns correspondingto these instructions, including some unused bits: Compare a with 100001X00 (1) Branch if not equal to L1 01X0XX10 (2) Set b to 0 10000XX1(3) Jump L2 01X100X0 (4) Set c to 1 1X1X00X0 (5) Add 1 to d 01X01000 (6)Set e to a XX001X0X (7)

In accordance with the second embodiment, combining the above two tablesgives the following assignments for unused bits, in which the arrowsshow how unused bits are assigned from adjacent instructions.

Where an arrow joins a used bit to an unused bit the unused bit can beassigned from that used bit. For example, bit 3 of instruction (2) canbe assigned from bit 3 of instruction (1). When a used bit is connectedto an unused bit, and that unused bit is connected in turn to otherunused bits, the value can be propagated to all of them. For example,bit 1 of instruction (2) can be propagated to unused bit 1 ininstructions (3) and (4).

Here is the complete list of the assignments in accordance with thesecond embodiment:

Some unused bits can remain unassigned after the method of the secondembodiment has been carried out. This will occur when a bit is unused inall instructions between a point of convergence and a point ofdivergence. In the example, bit 2 of instructions (1) and (2) areunassigned for this reason. A third embodiment, given later, will “seed”the group of unused bits using a used bit in an adjacent instruction. Inthis example, bit 2 of instruction (2) could be seeded with a ‘0’ frominstruction (3).

The mean inter-instruction Hamming distances for this example are: MeanHamming distance with unused bits . . . Set taking flow Instruction Allset to Set using first of control into sequence ‘0’ (simple) algorithmaccount 1

2

3

4

6

7 2.80 3.0 2.60 1

2

5

6

7 3.00 2.75 2.25 Mean, assuming 2.90 2.88 2.43 sequences are equallyprobable

Difficulties arise when an instruction is both a point of convergenceand a point of divergence. This will occur when a branch instructionleads to another branch instruction. The example in the following tableillustrates this. ... ... ... ... (1) jump L1 ... ... ... ... (2) jumpL1 (3) ... ... ... ... (4) L1: branch if negative, to L2 (5) ... ... ...... (6) L2:

Possible paths through this code are: (1)

(4)

(5) (1)

(4)

(6) (2)

(4)

(5) (2)

(4)

(6) (3)

(4)

(5) (3)

(4)

(6)

Instruction (4) is a point of both convergence and divergence, and theembodiments described so far offer no instruction on which it shouldbase any unused bits that it may contain.

In a further variant embodiment, unused bit assignment is based on thevalues of bits in the multiple preceding and following instructions. Ingeneral an optimal assignment requires knowledge of the relativeprobabilities of each path. However if all of the precedinginstructions, or all of the following instructions, are known and havethe same bit value an optimal assignment is still possible. The thirdembodiment given later simply copies a value from an adjacentinstruction.

Many processors incorporate an architectural feature called pipeliningthat affects the sequence in which instructions are fetched. In apipelined processor, one or more instructions sequentially after abranch instruction will be fetched and then discarded if the branch istaken. For the purposes of this analysis these instruction fetches areas important as any other and need to be taken into account.

Considering the same example as before: Compare a with 1 (1) Branch ifnot equal to L1 (2) set b to 0 (3) Jump to L2 (4) L1: set c to 1 (5) L2:add 1 to d (6) set e to a (7)

In a non-pipelined processor, the possible execution sequences for thiscode are 1

2

5

6

7 and 1

2

3

4

6

7. For a pipelined processor that fetches one extra instruction aftertaken branches the possible execution sequences are 1

2

3

[3]

5

6

7 and 1

2

3

4

[5]

6

7, where [n] indicates the fetched-but-discarded instruction.

The second and third embodiments can function correctly for pipelinedprocessors with only a minor change. “Points of divergence”, which werepreviously considered to be branch instructions, arm now theinstructions an appropriate distance after the branch instructions.

The method of the third embodiment can be summarised by the followingset of instructions

Let C be the set of instructions that are at points of convergence. Thismeans all instructions that are labelled as branch targets.

Let D be the set of instructions that are at points of divergence. Thismeans all branch instructions, or if pipelining is being taken intoaccount, all instructions that are the appropriate distance after abranch instruction.

Let E be the set of pairs (I, J) where I and J are instructions thatsatisfy either or both of the following conditions:

-   -   J is not an element of C, and I immediately precedes J    -   J is not an element of D, and I immediately follows J        (Note: Each element of E corresponds to an arrow in FIGURE, with        I being the instruction at the tail of the arrow and J being the        instruction at the head of the arrow).

For each bit in turn: Let B be the bit-number of the current bit Whilethere are any instructions where bit B is unused   While there is anelement (I, J) of E, such that bit B of instruction   J is unused andbit B of I is used     Set bit B of instruction J to the value of bit Bof     instruction I   End-while     If there are still any instructionswhere bit B is unused   (Note: This step implements the “seeding”process mentioned earlier)       Find any two instructions I and J, suchthat bit B of       instruction I is used, bit B of J is unused, and Iand J are       adjacent instructions [for the purposes of this step,the       first instruction in the program should be considered to be      preceded by and the last instruction followed by an      instruction containing all zeros]       Set bit B of instruction Jto the value of bit B of       instruction I     End-if   End-whileEnd-for

1-21. (Cancelled)
 22. A method of reducing the power consumption of amicroprocessor system which comprises a microprocessor and a memoryconnected by at least one bus, said memory containing a plurality ofdata values, each represented by a number of bits, for transmission tosaid microprocessor via said at least one bus, and at least some of saiddata values containing unused bits, said method including assigningvalues to said unused bits in such a way as to reduce the Hammingdistance between successive data values by a greater extent than settingall of said unused bits to an arbitrary predetermined value.
 23. Amethod as claimed in claim 22, which further comprises the steps of:considering each bit of each data value in turn and, if a particular bitof the considered data value is unused, assigning to said particular bitthe value of the corresponding bit from an adjacent data value.
 24. Amethod as claimed in claim 23, wherein said adjacent data value isadjacent to said considered data value in said memory.
 25. A method asclaimed in claim 23, wherein said adjacent data value is adjacent tosaid considered data value in the sequence in which said data values areread from said memory.
 26. A method as claimed in claim 23, wherein saidadjacent data value precedes said considered data value.
 27. A method asclaimed in claim 23, wherein said adjacent data value follows saidconsidered data value.
 28. A method as claimed in claim 22, wherein saiddata values represent instructions to said microprocessor.
 29. A methodas claimed in claim 28, which further comprises the step of determiningwhether each instruction is a point of divergence.
 30. A method asclaimed in claim 29, wherein if said microprocessor employs pipelining,then points of divergence are considered to be the instructions anappropriate distance after branch instructions.
 31. A method as claimedin claim 29, wherein for instructions which are points of divergence,assignment of unused bits in the instruction is based on the precedinginstruction.
 32. A method as claimed in claim 28, which furthercomprises the step of determining whether each instruction is a point ofconvergence.
 33. A method as claimed in claim 32, wherein forinstructions which are points of convergence, assignment of unused bitsin the instruction is based on the following instruction.
 34. A methodas claimed in claim 28, wherein for instructions which have only onepossible preceding instruction and one possible following instruction,assignment of unused bits in the instruction is based on either thepreceding or the following instruction.
 35. A method as claimed in claim28, wherein in the case of an instruction which is both a point ofdivergence and a point of convergence, assignment of unused bits in theinstruction is based on a consideration of bits in multiple precedingand following instructions.
 36. A method as claimed in claim 35, whereinassignment of unused bits in the instruction is based on a considerationof the probabilities of different paths to and from said instruction.37. A method as claimed in claim 35, wherein assignment of unused bitsin the instruction is simply based on an adjacent instruction.
 38. Amethod as claimed in claim 28, wherein if any unused bits remainunassigned after the method has been carried out, then these unused bitsare seeded using the corresponding bits from an adjacent instruction.